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EXECUTIVE  SUMMARY 


The  Armed  Services  Vocaclonal  Aptitude  Battery  (A3VAB)  Is  ueod  for 
•-election  and  classification  of  enlisted  personnel.  The  ASVAB  is  useful 
to  DOD  because  of  Its  predictive  validity  (i.e,,  its  ability  to  predict 
[jcrforraance  on  the  Job).  A  computerized  adaptive  testing  (CAT)  version 
ofc  the  ASVAB  has  been  developed.  Through  computerized  testing,  value  of 
the  ASVA^  may  be  increased  by  adding  new  tests  that  cannot  be 
administered  at  present.  The  utility  of  adding  such  tests  has  been 
crJCimaCed  in  a  cost/benefit  analysis  of  the  CAT-ASVAB  project  [2]. 

'fhe  formulas  used  in  calculating  the  benefit  of  new  predictors 
require  simplifying  assumptions.  Such  assumptions  are  bound  to  be 
violated  to  some  extent  in  reality.  If  a  formula  is  sensitive  to 
violations  of  its  assumptions,  the  actual  benafit  may  be  quite  different 
from  the  value  given  by  the  formula. 

This  research  memorandum  uses  data  from  the  Marine  Corps  Job 
Performance  Measurement  Project  and  from  military  applicants  tested  In 
lato  1984.  It  compares  benefits  calculated  in  five  different  ways.  The 
re-sults  chow  that  the  calculated  benefit  may  halve  or  double  from  one 
formula  to  another,  Thua ,  tha  benefit  oatlmate  depends  strongly  on  how 
it  is  calculated.  Such  unstable  estimates  are  not  useful  in  making 
operational  decialon.s. 
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INTRODUCTION 


The  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  is  used  for 
selection  and  classification  of  enlisted  personnel.  It  contains  ten 
subtests- -General  Science  (GS) ,  Arithmetic  Reasoning  (AR) ,  Word 
Knowledge  (WK) ,  Paragraph  Comprehension  (PC) ,  Numerical  Operations  (NO) , 
Coding  Speed  (CS) ,  Auto  and  Shop  Information  (AS),  Mathematics  Knowledge 
(MK) ,  Mechanical  Comprehension  (MC) ,  and  Electronics  Information  (El). 

An  eleventh  subtest- -Verbal  (VE)--is  defined  as  the  sum  of  WK  and  PC. 
Standard  scores  rather  than  raw  scores  on  the  subtests  are  used  in  all 
decisions  based  on  the  ASVAB.  Standard  scores  are  integers  from  20  to 
80,  with  mean  50  and  standard  deviation  10  in  the  1980  reference 
population  [1] .  Standard  scores  on  subtests  are  combined  into  the  Armed 
Forces  Qualification  Test  (AFQT)  score,  which  is  the  same  for  all 
services ,  and  into  occupational  composites  that  vary  from  one  service  to 
another.  The  AFQT  is  the  primary  score  for  selection  of  an  applicant 
for  enlistment,  while  composite  scores  are  used  to  classify  a  recruit 
into  one  of  the  available  military  occupational  specialties  (MOSs) . 

A  computerized  adaptive  testing  (CAT)  version  of  the  ASVAB  has  been 
developed.  In  CAT,  a  computer  program  selects  items  for  an  examinee  on 
the  basis  of  available  information  about  the  examinee’s  ability.  Thus, 
a  capable  examinee's  time  is  not  wasted  on  easy  items  nor  a 
below- average  examinee's  time  on  difficult  items.  As  a  result,  CAT  can 
achieve  as  much  precision  as  the  conventional  paper-pencil  (PP)  version 
of  a  test  with  fewer  items.  On  the  average,  the  CAT- ASVAB  takes  about 
half  as  long  as  the  PP-ASVAB. 

The  ASVAB  is  useful  to  DOD  because  of  its  predictive  validity 
(i.e.,  its  ability  to  predict  performance  on  the  job).  Recruits 
selected  using  the  ASVAB  perform  better  than  those  selected  at  random. 
The  value  of  the  ASVAB  will  increase  if  it  is  improved  by  adding  new 
tests  that  measure  traits  such  as  perceptual  and  psychoraotor 
abilities.  The  utility  of  adding  such  tests  has  been  estimated  in  a 
cost/benefit  of  the  CAT-ASVAB  project  ([2],  tab  E)  using  the 
"Cronbach-Gleser  formula."  One  can  derive  a  ntimber  of  such  formulas 
that  differ  in  the  number  of  simplifying  assumptions  required.  These 
assumptions,  although  reasonable,  are  likely  to  be  wrong  to  some 
extent.  This  research  memorandum  demonstrates  the  sensitivity  of  these 
formulas  to  violations  of  their  assumptions.  When  the  additional 
validity  due  to  the  new  tests  is  small,  the  effect  of  departure  from  the 
assumptions  may  be  of  the  same  size  as  the  utility  being  calculated. 

CALCULATING  PERFORMANCE  GAIN 

A  utility  analysis  attempts  to  estimate  the  performance  gain  that 
will  result  from  replacing  one  composite  with  a  different,  more  valid 
composite  in  a  future  population  of  applicants.  This  analysis  is  based 
on  information  from  one  or  more  validity  studies  relating  test  scores  to 
some  measure  of  performance.  In  such  a  study,  correlations  between 
ASVAB  subtests  and  the  performance  measure  are  calculated  from  data  on  a 
sample  of  enlisted  personnel.  Enlisted  personnel  have  been  selected 
previously  using  their  ASVAB  scores.  As  a  result,  their  scores  have  a 
smaller  spread  than  scores  for  the  national  population.  Therefore,  the 
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sample  statistics  are  adjusted  for  range  restriction.  The  corrected 
means,  standard  deviations,  and  correlations  describe  the  results  that 
would  have  been  obtained  if  performance  could  be  measured  in  the  entire 
national  population.  These  values  can  be  used  to  calculate  the 
corrected  validity  (i.e.,  the  correlation  with  performance  in  the 
national  population)  of  any  composite  score  of  interest. 

The  central  problem  in  utility  analysis  is  to  apply  this  knowledge 
to  an  unknown  applicant  population  of  the  future.  Because  score 
distributions  in  such  a  population  are  unknown,  one  must  rely  on 
assvimptions .  Different  assumptions  lead  to  different  formulas.  If  all 
assumptions  and  hence  all  formulas  are  correct,  they  will  yield  almost 
the  same  value  for  the  performance  gain.  This  paper  shows  that  this 
does  not  happen  in  a  large  sample  of  applicants  tested  in  1984,  which 
means  that  the  simplifying  assumptions  are  incorrect  in  this  sample. 

The  paper  examines  how  the  calculated  performance  gain  changes  as  more 
and  more  information  from  the  sample  is  used. 

DATA  FOR  ILLUSTRATION 

A  realistic  example  for  calculating  performance  gains  is  provided 
by  the  recent  change  in  the  composition  of  the  AFQT.  Until  31  December 
1988,  the  AFQT  contained  subtests  AR,  WK,  PC,  and  NO.  This  will  be 
referred  to  as  the  old  AFQT.  Its  raw  score  is  given  by 

OLD_RAW  -  AR  +  WK  +  PC  +  NO/2  .  (1) 

The  new  AFQT,  implemented  on  1  January  1989,  contains  MK  instead  of  NO, 
and  uses  standard  scores  instead  of  raw  scores.  Thus,  the  sum  of 
standard  scores  is 


NEW_SSS  -  2  SVE  +  SAR  +  SMK  ,  (2) 

where  SVE  is  the  standard  score  on  VE,  and  so  on.  These  scores  were 
standardized  in  this  study  so  as  to  have  a  mean  of  0  and  a  standard 
deviation  of  100  in  the  reference  population  [1] .  The  standardized 
scores  will  be  referred  to  as  OLD  and  NEW. 

Data  from  the  Marine  Corps  Job  Performance  Measurement  (JPM) 
provided  scores  on  a  hands-on  performance  test  (HOPT) .  HOPT  scores  were 
standardized  to  have  a  standard  deviation  of  10  in  the  reference  popula¬ 
tion.  The  sample  from  MOS  0351  (Assaultman)  was  used  because  it 
provided  the  highest  incremental  validity  for  the  new  AFQT  over  the  old 
AFQT.  After  eliminating  the  effect  of  time -in- service ,  the  predicted 
HOPT  score  was  given  by 

H0PT_PRED  -  7.9104  +  0.0047  SGS  +  0.1293  SAR  +  0.0053  SWK  -  0.0429  SPC 

-  0.1564  SNO  +  0.0901  SCS  +  0.1066  SAS  +  0.1667  SMK 
+  0.1291  SMC  +  0.2209  SEI  .  (3) 

The  standard  error  of  estimate  was  8.088. 

This  regression  equation  was  assumed  to  be  valid  in  all  popula¬ 
tions.  Given  this  assumption,  one  can  calculate  the  correlation  of  HOPT 
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in  any  population  with  any  ASVAB  subtest  or  composite,  provided  the 
distribution  of  ASVAB  sub test  scores  in  that  population  is  known.  The 
validities  of  OLD  and  NEW  in  the  reference  population  turned  out  to  be 
0.4475  and  0.4841,  respectively.  Thus,  the  increase  in  predictive 
validity  was  0.0366. 

The  applicant  sample  consisted  of  those  who  took  Form  15c  in  the 
Initial  Operational  Test  and  Evaluation  (lOT&E)  of  ASVAB  forms  11/12/13 
in  1984.  It  has  been  shown  by  Maier  and  Hiatt  [3]  that,  by  this  time, 
scores  on  the  speeded  subtests  suffered  from  score  inflation  relative  to 
the  1980  reference  population.  Therefore,  NO  and  CS  scores  were 
adjusted  for  inflation  using  the  equating  approach  of  Maier  and  Hiatt. 
The  sample  size  was  15,065.  The  cut  scores  were  set  so  that  about 
90  percent  of  the  sample  would  be  selected  for  military  service.  The 
minimum  acceptable  scores  turned  out  to  be  -107  on  OLD  and  -106  on 
NEW.  The  corresponding  numbers  of  selected  applicants  were  13,578  and 
13,561.  All  calculations  are  made  for  illustration  only.  Random  errors 
of  sample  statistics  are  of  no  interest.  Therefore,  the  distinction 
between  samples  and  populations  will  be  Ignored. 

INCREMENTAL  VALIDITY 

The  phrase  "incremental  validity"  is  used  frequently  in  connection 
with  new  tests.  It  often  means  the  increase  in  multiple  correlation 
when  the  new  test  is  added  to  the  ASVAB.  This  meaning,  however,  is 
irrelevant  to  a  utility  analysis  because  DOD  uses  composite  i.cores,  not 
multiple  regression,  in  selection  and  classification.  The  discussion  in 
this  memorandum  will  use  only  the  composites  OLD  and  NEW.  A  proper 
analysis  must  also  take  into  account  the  distinction  between  selection 
and  classification,  which  makes  the  analysis  very  difficult.  The  calcu¬ 
lations  in  the  CAT-ASVAB  coct/benefit  analysis  [2]  considered  selection 
only  and  so  will  the  formulas  in  this  paper.  "Incremental  validity" 
will  mean  the  increase  in  the  correlation  with  HOPT  on  replacing  OLD 
with  NEW. 

Any  correlation  depends  on  the  spread  of  scores  in  the  population, 
and  hence  cannot  be  assumed  to  be  the  same  in  the  reference  and 
applicant  populations.  It  seems  reasonable,  however,  to  assume  that  the 
regression  of  performance  on  the  composite  remains  the  same.  Given  this 
assumption,  and  the  variances  of  the  composite  in  the  two  populations, 
one  can  calculate  its  validity  in  the  applicant  population.  Because 
nothing  is  known  about  the  applicant  population  of  the  future,  a  1984 
applicant  sample  is  used  as  a  substitute.  Even  in  this  sample,  if  a  new 
test  were  really  being  evaluated,  nothing  would  be  known  about  the  new 
composite.  So,  at  first,  only  information  about  OLD  is  used. 

The  standard  deviation  of  OLD  in  the  applicant  sample  is  78.32. 
Assuming  that  NEW  has  the  same  spread,  validities  of  OLD  and  NEW  are 
0.3649  and  0.3976.  As  was  to  be  expected,  due  to  the  decrease  in 
standard  deviation  from  100  to  78.32,  these  validities  are  lower  than  in 
the  reference  population.  The  incremental  validity  in  the  applicant 
group  is  0.0327.  The  same  theory  yields  9.605  as  the  standard  deviation 
of  HOPT  among  applicants. 
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Assume,  as  is  done  implicitly  in  [2],  that  composite  distributions 
among  applicants  are  normal.  The  normalized  z- score  corresponding  to 
the  selection  ratio  of  90  percent  is  -1.28,  and  the  height  of  the 
ordinate  at  this  score  is  0.176.  Therefore,  using  equation  1.10  from 
Cronbach  and  Gleser  [4,  p.  308],  the  performance  gain  per  recruit  is 

G1  -  0.0327  (0.176/0.9)  (9.605)  -  0.0614  .  (4) 

Now,  discard  the  assumption  of  normality,  and  use  the  actual  means 
of  OLD  among  all  applicants  and  among  those  selected.  These  are  -0.915 
and  15.508,  respectively.  It  is  assumed  that  distributions  of  NEW  and 
OLD  have  the  same  shape.  When  the  factor  (0.176/0.9)  is  replaced  by  the 
difference  between  means  divided  by  the  standard  deviation  of  OLD, 
the  gain  per  recruit  is 

G2  -  0.0327  (15.508  +  0.915)/78.32  (9.605)  -  0.0659  .  (5) 

Thus,  the  result  does  not  change  much  when  the  assumption  of  normal  ity 
is  dropped. 

Now,  drop  the  assumption  that  distributions  of  OLD  and  NEW  are 
similar,  and  use  the  actual  sample  statistics  of  NEW.  The  standard 
deviation  is  78.91,  and  the  adjusted  validity  of  NEW  is  0.4001.  Hence, 
the  incremental  validity  is  0.0352.  The  means  among  total  and  selected 
groups  are  0.981  and  16.835.  Cronbach  and  Gleser' s  equation  1.10  now 
yields 

G3  -  [0.4001(16.835  -  0.981)/78.91  -  0.3649(15.508  +  0 . 915)/78 . 32] (9 . 605) 
-  [0.080385  -  0.076516]  (9.605)  -  0.0372  .  (6) 

Thus ,  the  difference  between  the  distributions  of  NEW  and  OLD  is  enough 
to  cut  the  performance  gain  almost  by  half.  This  happens  even  though 
Che  two  composites  share  three  of  their  four  subtests  and  the  incre¬ 
mental  validity  is  higher  than  in  G2. 

MULTIPLE  REGRESSION 

When  the  stronger  univariate  assumptions  are  discarded  and  only  the 
multiple  regression  equation  3  is  used  instead,  the  standard  deviation 
of  HOPT  is  9.682,  and  the  validities  of  OLD  and  NEW  are  0.3797  and 
0.4390.  With  these  validities  in  the  formula  above,  the  gain  is 

G4  -  [0.4390  (15.854/78.91)  -  0.3797  (16.423/78.32]  (9.682) 

-  [0.088201  -  0.079620]  (9.682)  -  0.0831  .  (7) 

This  is  substantially  higher  than  any  of  the  previous  estimates.  The 
increase  is  due  to  the  fact  that  the  true  incremental  validity  among 
applicants  is  0.0593  rather  than  0.0327  or  0.0352. 

Now,  ignore  the  composites  altogether,  a.  1  use  the  full  multiple 
regression  to  calculate  the  gain.  Mean  HOPT  is  41.6058  among  those 
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selected  using  OLD  and  41.7215  among  those  selected  using  NEW.  Thus, 
the  actual  performance  gain  per  recruit  is 

G5  -  41.7215  -  41.6058  -  0.1157  .  (8) 

This  is  higher  even  than  04 ,  about  twice  as  large  as  01  and  02 ,  and  over 
three  times  03 . 

INTERPRETATION 

Old  and  new  AFQT  have  three  subtests  in  common:  AR,  WK,  and  PC. 

It  is  reasonable  to  assume  that  distributions  of  their  scores  have  the 
same  shape.  Yet  this  assumption  is  wrong,  and  its  effect  is  that  gain 
02  is  about  twice  as  big  as  03.  This  issue,  however,  is  not  very 
important  because  assumptions  about  the  shapes  of  distributions  are  not 
central  to  discussions  of  validity  and  utility  in  selection. 

The  distinction  between  estimates  01  to  03  on  the  one  hand  and  04 
and  05  on  the  other  deserves  careful  attention.  The  former  are  based  on 
simple  regression  on  a  single  composite  at  a  time,  while  the  latter  use 
multiple  regression  on  all  subtests. 

In  discussions  of  validity  studies,  it  is  customary  to  correct 
correlations  to  the  reference  population,  and  then  evaluate  composite 
validities  obtained  from  the  corrected  correlation  matrix.  If  the 
correlation  (or  covariance)  matrix  is  the  basis  of  all  calculations,  it 
is  impossible  to  detect  any  nonlinearity  in  the  regression  of  the 
criterion  on  the  composite.  It  is  important  to  note  that  linear 
regression  on  subtests  (or  even  on  the  two  composites  OLD  and  NEW)  does 
not  guarantee  linear  regression  on  a  single  composite.  In  the  reference 
population,  when  the  square  of  OLD  was  included  in  the  regression 
equation,  it  explained  more  than  5  percent  as  much  variance  as  was 
explained  by  the  linear  term.  Usually,  a  contribution  of  this  size  is 
safe  to  ignore.  When  one  is  studying  the  value  of  incremental  validity, 
however,  the  quantities  of  interest  are  themselves  quite  small. 
Therefore,  one  must  not  ignore  other  small  influences  such  as  quadratic 
terms  in  regression  equations. 

Emphasis  on  the  reference  population  tends  to  divert  attention  away 
from  the  various  ways  in  which  it  differs  from  the  applicant  population. 
It  is  reasonable  to  assume  that  the  regression  of  HOPT  on  a  composite  is 
the  same  in  both  populations ,  even  if  correlations  differ  because 
variances  are  different.  This  assumption  leads  to  the  univariate 
adjustments  of  validity  used  in  estimates  01  to  03.  As  has  been  seen, 
the  assumption  is  incorrect.  In  the  present  example,  the  actual 
validities  in  the  applicant  group  turned  out  to  be  higher  than  the 
simple  estimates,  but  they  might  be  lower  in  another  situation. 

Therefore ,  given  the  incremental  validity  of  a  new  composite  in  the 
reference  population,  one  does  not  know  what  it  will  be  among 
applicants.  For  utility  analysis,  validity  in  the  applicant  population 
is  what  matters. 

Even  within  the  applicant  sample,  performance  gain  increased  by 
over  a  third  from  04  to  05 .  To  understand  why  a  small  quadratic  term 
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can  have  such  an  effect,  it  is  instructive  to  see  where  the  performance 
gain  comes  from.  Even  though  the  formulas  express  the  gain  in  terms  of 
mean  scores,  the  actual  gain  is  not  distributed  throughout  the  entire 
selected  group.  The  gain  comes  from  the  superior  performance  of  those 
who  qualify  on  NEW  but  not  on  OLD,  over  those  who  qualify  on  OLD  but  not 
on  NEW.  When  the  correlation  between  OLD  and  NEW  is  hi^,  these  two 
groups  contain  only  a  small  fraction  of  the  total  applicant  sample.  The 
correlation  between  old  and  new  AFQTs  in  the  1984  applicant  sample  is 
0.953.  Therefore,  most  applicants  who  meet  the  OLD  requirement  also 
meet  the  NEW  requirement. 

Of  the  total  of  15,065  applicants,  305  qualify  on  OLD  but  not  on 
NEW,  with  mean  OLD,  NEW  and  HOPT  scores  of  -88.6,  -119.1,  and  42.83. 

The  corresponding  means  are  -121.6,  -88.4,  and  47.79  for  the  288 
applicants  who  qualify  on  NEW  but  not  on  OLD.  The  entire  performance 
gain  due  to  the  change  in  the  AFQT  comes  from  the  mean  difference 
47.79  -  42.83  -  4.96  between  these  two  groups,  each  of  which  contains 
only  about  2  percent  of  the  total  sample.  Both  subgroups  are  near  the 
low  end  of  the  score  distribution.  The  means  of  OLD  and  NEW  in  these 
subgroups  differ  by  over  30  points;  in  the  total  sample,  the  means  are 
almost  equal  and  the  standard  deviation  of  the  difference  is  24.  Thus, 
the  applicants  in  these  subgroups  have  highly  unusual  patterns  of 
scores.  One  cannot  be  confident  that  relationships  that  hold  in  the 
total  sample  are  valid  in  these  subgroups  as  well.  Influences  such  as 
quadratic  terms  in  regression,  whose  effects  appear  small  in  the  total 
sample,  can  have  a  major  effect  on  the  mean  HOPT  in  these  subgroups. 

CONCLUSIONS 

Calculations  in  the  1984  lOT&E  sample  show  that  the  performance 
gain  formulas  are  very  sensitive  to  small  violations  of  their  assump¬ 
tions  .  The  gain  dropped  by  a  half  from  G2  to  G3  because  the 
distributions  of  OLD  and  NEW  differ  in  shape,  even  though  only  one  of 
four  subtests  has  been  changed.  It  more  than  doubles  from  G3  to  G4 
because  simple  regressions  on  the  comjposites  are  different  in  the 
reference  and  applicant  populations .  It  increases  from  G4  to  G5  by  over 
a  third  because  of  small  nonlinear  components  in  the  regressions  of  HOPT 
on  OLD  and  NEW.  If  the  changes  from  G3  to  G5  had  been  negative  instead 
of  positive,  use  of  NEW  would  have  lowered  mean  performance  despite  the 
increased  validity  in  the  reference  population.  The  appendix  illus¬ 
trates  this  possibility  with  simulated  data. 

The  results  in  this  study  came  from  applying  one  regression 
equation  (equation  3)  to  one  large  applicant  sample.  This  does  not 
weaken  the  conclusions.  A  single  example  suffices  to  show  what  can 
happen,  and  thus  undermines  confidence  in  simple  formulas.  All  that  is 
required  is  for  the  regression  equation  and  the  data  set  to  be 
realistic.  (Confidence  in  the  formulas  becomes  even  weaker  if  one 
rejects  the  assumption  that  equation  3  holds  for  all  applicants,  includ¬ 
ing  those  who  qualify  on  one  composite  but  not  the  other.)  Therefore, 
the  benefit  estimates  in  tab  E  of  the  CAT-ASVAB  cost/benefit  analysis 
[2],  based  on  the  Cronbach-Gleser  formula,  are  not  dependable  enough  to 
be  useful  in  making  operational  decisions. 
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The  following  simulation  was  performed  to  show  that  mean 
performance  can  indeed  go  down  when  the  predictive  validity  of  the 
composite  is  increased.  Simulated  OLD  scores  were  standard  noirmal 
variates  multiplied  by  100.  Correlated  normal  scores  were  generated 
with  the  equation 


X  -  0.95  OLD  +  E  , 

where  E  was  normal  with  a  mean  of  0  and  a  standard  deviation  of  30. 

These  were  converted  into  NEW  scores  using 

NEW  -  X  (1  +  X/500)  -  20  . 

These  NEW  scores  had  a  positively  skewed  distribution  with  skewness  of 
1.1.  For  each  examinee ,  an  HOPT  score  was  computed  as 

HOPT  -  50  +  0.025  OLD  +  0.035  NEW  +  E’  , 

where  E'  was  normal  with  a  mean  of  0  and  a  standard  deviation  of  8  so 
that  the  standard  deviation  of  HOPT  was  10.  The  number  of  simulated 
examinees  was  20,000. 

The  validities  of  OLD  and  NEW  were  found  to  be  0.584  and  0.593,  so 
that  NEW  had  higher  p^'edictive  validity.  HOPT  was  regressed  on  OLD  and 
on  NEW,  with  squares  of  the  composite  scores  included  as  predictors. 

For  OLD,  the  quadratic  term  explained  2.35  percent  as  much  variance  as 
the  linear  term  did.  For  NEW,  this  percentage  was  only  0.75.  Note  that 
these  nonlinear  effects  were  found  even  though  true  multiple  regression 
on  the  two  composites  was  strictly  linear. 

As  in  the  1984  sample,  each  examinee  was  selected  or  rejected  using 
OLD  and  using  NEW  with  a  selection  ratio  of  90  percent.  Mean  HOPT 
scores  were  computed  for  those  selected  with  OLD  but  not  with  NEW,  and 
vice  versa.  Mean  HOPT  values  were  43.08  for  those  selected  with  OLD  and 
42.76  for  those  selected  with  NEW.  Although  the  difference  between 
these  numbers  is  small,  the  important  point  is  that  using  OLD  yields 
higher  performance  even  though  NEW  has  higher  validity. 
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